Udacity Deep Reinforcement Learning Nanodegree Program
This repository contains the solution of the first project of the Udacity Deep Reinforcement Learning Nanodegree Program from Christian Motschenbacher
Name: Christian Motschenbacher
Date: 01/2019
Project: Udacity Deep Reinforcement Learning Nanodegree Program: Project 1 Navigation (Banana)
This file contains the project environment details such as the state and action space and when the environment is considered soved. It also contains how to get started by installing the dependencies and or download the necessary files on the local computer. Furthermore, it describes how to run the code in the repository, to train and test the agent.
The task of this project was to to train an agent, who navigates in a large environment, collect as many yellow bananas while avoiding blue bananas. The following video shows how a trained agent is performing this task.
A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:
0 - move forward.1 - move backward.2 - turn left.3 - turn right.The task is episodic and in order to solve the environment, the agent must achieve an average score of +13 over 100 consecutive episodes.
The notebook for the training and testing "Navigation_solution_training_testing.ipynb" in this folder contains the chapter "Examine the State and Action Spaces", where you can find an example of the state space and action space as well more detailed information about the environment and project.
For the solution of this task it has been used the banana environment from Unity-Technologies, the Python programming language as well the libraries NumPy, PyTorch and others.
This DRL project use three main software components. Those components are Python, Udacity DRL course dependencies and as well the Unity-Technologies/ml-agents dependencies. The installations of those elements are described in the following subsections.
In order to set up the python environment run the following code and follow the instructions below.
Create (and activate) a new environment with Python 3.6 including the anaconda packages.
The following code state how to install python on Linux, Mac or Windows.
In order to install python on your target system you must write or copy paste the following command into the command line of your system.
The three code commands in the following code sections do the following.
Deactivate the environment
conda create --name drlnd python=3.6 anaconda
source activate drlnd
conda deactivate
conda create --name drlnd python=3.6 anaconda
activate drlnd
deactivate
Note: Activation and deactivation of the environment.
Command (... activate drlnd): This command needs to be executed before any additional python library will be installed into this environment.
Command (... deactivate): This command needs to be executed before the user will work with another environment.
For the setup of this project once the first command has been executed the environment can be activated and be activated until all dependencies have been installed.
Download this repository
Before you continue with the next dependencies download this repository onto your local computer, in order to continue with the following sections.
For easier setup of this project the Udacity DRL course dependencies has been extracted from the repository deep-reinforcement-learning and copied into this repository. Therefore, the user can navigate into the folder "./Installation/DRL_Environment/python" of this downloaded repository and perform the command pip install . in the command line on the local PC to install the dependencies. More information about the installation if needed can be found at the above linked Udacity repository.
For easier setup of this project the Unity-Technologies/ml-agents dependencies has been extracted from the repository Unity-Technologies/ml-agents and copied into this repository. Therefore, the user can navigate into the folder "./Installation/Unity_Technologies_ml_agents/ml-agents/" of this downloaded repository and perform the command pip install -e . in the command line on the local PC to install the dependencies. More information about the installation if needed can be found at the above linked Unity-Technologies/ml-agents repository.
For this project, you will not need to install Unity - this is because the environments for the different operating systems have already been build for you and you can download it from the links below. You need only select the environment that matches your operating system:
Then, place the file in the "./Navigation_notebook_resources/Environment/" folder of this downloaded GitHub repository and unzip (or decompress) the file.
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
drlnd environment.¶Run the following code in your command line of your computer to create an IPython kernel for your notebook.
python -m ipykernel install --user --name drlnd --display-name "drlnd"
drlnd environment by using the drop-down Kernel menu.¶<img src = "Navigation_notebook_resources/Picture/UdacityDRL03.png" width = 550, align = "center">
Once you have installed the previous dependencies you can continue with the following instruction on how to run the code.
The notebook Navigation_solution_training_testing.ipynb, which is in the root folder of this repository, contains the testing and training code of the agent in the environment. The fist part explains and shows the performance (testing) of an untrained agent, the middle part of this file explains how to train the agent and the bottom part of this file explains and shows the performance (testing) of an trained agent. It is not required to run the whole notebook from the beginning in order to run the testing section, because the training weights file of the neural network (NN) model has been stored in the folder ./Navigation_notebook_resources/Model_weights/. This means for instance that the user can start this notebook, run the code cells "load libraries", "create environment", "create agent instance" and then the user can run the testing code in the end of the notebook. Otherwise the notebook is well documented, easy to follow and self-explanatory.
The "agent class" is in the file dqn_agent.py, which is in the root folder of this repository, contains DQN agent class, with all the elements of the DQN algorithm excluding the NN models, which are in the file model.py of the same folder. This solution implementation evaluated more DQN NN models, which can be selected in the file dqn_agent.py by uncommenting the desired model and commenting the other models in the beginning of the file dqn_agent.py. The code in the file is well documented, easy to follow and self-explanatory.
The "fully connected NN model classes" are in the file model.py, which is in the root folder of this repository, containing the different fully connected NN model classes, with all the elements of the NN algorithm. The code in the file is well documented, easy to follow and self-explanatory.
Further resources and references regarding this project and DQN can be found in the following links.
Riedmiller, Martin. "Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method." European Conference on Machine Learning. Springer, Berlin, Heidelberg, 2005. http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature518.7540 (2015): 529. http://www.davidqiu.com:8888/research/nature14236.pdf
Hado van Hasselt, Arthur Guez, David Silver "Deep Reinforcement Learning with Double Q-learning" arXiv. https://arxiv.org/abs/1509.06461
Tom Schaul, John Quan, Ioannis Antonoglou, David Silver "Prioritized Experience Replay" arXiv. https://arxiv.org/abs/1511.05952
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas "Dueling Network Architectures for Deep Reinforcement Learning" arXiv. https://arxiv.org/abs/1511.06581
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu "Asynchronous Methods for Deep Reinforcement Learning" arXiv. https://arxiv.org/abs/1602.01783
Marc G. Bellemare, Will Dabney, Rémi Munos "A Distributional Perspective on Reinforcement Learning" arXiv. https://arxiv.org/abs/1707.06887
Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg "Noisy Networks for Exploration" arXiv. https://arxiv.org/abs/1706.10295
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver "Rainbow: Combining Improvements in Deep Reinforcement Learning" arXiv. https://arxiv.org/abs/1511.06581